4 research outputs found
Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation
Due to convenience, open-source software is widely used. For beneficial
reasons, open-source maintainers often fix the vulnerabilities silently,
exposing their users unaware of the updates to threats. Previous works all
focus on black-box binary detection of the silent dependency alerts that suffer
from high false-positive rates. Open-source software users need to analyze and
explain AI prediction themselves. Explainable AI becomes remarkable as a
complementary of black-box AI models, providing details in various forms to
explain AI decisions. Noticing there is still no technique that can discover
silent dependency alert on time, in this work, we propose a framework using an
encoder-decoder model with a binary detector to provide explainable silent
dependency alert prediction. Our model generates 4 types of vulnerability key
aspects including vulnerability type, root cause, attack vector, and impact to
enhance the trustworthiness and users' acceptance to alert prediction. By
experiments with several models and inputs, we confirm CodeBERT with both
commit messages and code changes achieves the best results. Our user study
shows that explainable alert predictions can help users find silent dependency
alert more easily than black-box predictions. To the best of our knowledge,
this is the first research work on the application of Explainable AI in silent
dependency alert prediction, which opens the door of the related domains
Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names?
Recent breakthroughs in pre-trained code models, such as CodeBERT and Codex,
have shown their superior performance in various downstream tasks. The
correctness and unambiguity of API usage among these code models are crucial
for achieving desirable program functionalities, requiring them to learn
various API fully qualified names structurally and semantically. Recent studies
reveal that even state-of-the-art pre-trained code models struggle with
suggesting the correct APIs during code generation. However, the reasons for
such poor API usage performance are barely investigated. To address this
challenge, we propose using knowledge probing as a means of interpreting code
models, which uses cloze-style tests to measure the knowledge stored in models.
Our comprehensive study examines a code model's capability of understanding API
fully qualified names from two different perspectives: API call and API import.
Specifically, we reveal that current code models struggle with understanding
API names, with pre-training strategies significantly affecting the quality of
API name learning. We demonstrate that natural language context can assist code
models in locating Python API names and generalize Python API name knowledge to
unseen data. Our findings provide insights into the limitations and
capabilities of current pre-trained code models, and suggest that incorporating
API structure into the pre-training process can improve automated API usage and
code representations. This work provides significance for advancing code
intelligence practices and direction for future studies. All experiment
results, data and source code used in this work are available at
\url{https://doi.org/10.5281/zenodo.7902072}